A Summarization System with Categorization of Document Sets

نویسندگان

  • Chikashi Nobata
  • Satoshi Sekine
  • Kiyotaka Uchimoto
  • Hitoshi Isahara
چکیده

We participated in both the single-document and multi-document summarization tasks at the TSC 2002. We have incorporated two modules into our earlier summarization system, which is based on a sentenceextraction technique, so that we could apply the system to the multi-document summarization task. One is a module to categorize document sets and the other is to estimate the similarity between sentences. The categorization of document sets is done based on extended named entity classes that include event or facility types as well as original classes. Our system uses the category information to decide how to use similarity information. The similarity between sentences is measured according to the Dice coefficient, and the results are used to either select a representative sentence from among similar sentences or to extract typical sentences from a given document set.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A survey on Automatic Text Summarization

Text summarization endeavors to produce a summary version of a text, while maintaining the original ideas. The textual content on the web, in particular, is growing at an exponential rate. The ability to decipher through such massive amount of data, in order to extract the useful information, is a major undertaking and requires an automatic mechanism to aid with the extant repository of informa...

متن کامل

Summarization as Feature Selection for Document Categorization on Small Datasets

Most common feature selection techniques for document categorization are supervised and require lots of training data in order to accurately capture the descriptive and discriminative information from the defined categories. Considering that training sets are extremely small in many classification tasks, in this paper we explore the use of unsupervised extractive summarization as a feature sele...

متن کامل

Exploiting Category-Specific Information for Multi-Document Summarization

We show that by making use of information common to document sets belonging to a common category, we can improve the quality of automatically extracted content in multi-document summaries. This simple property is widely applicable in multi-document summarization tasks, and can be encapsulated by the concept of category-specific importance (CSI). Our experiments show that CSI is a valuable metri...

متن کامل

CRL/NYU Summarization System at DUC-2004

We participated in two multi-document summarization tasks (Task 2 and Task 5) at the DUC-2004 formal run and evaluated the performance of our summarization system. Our system based on sentence extraction also uses a module to estimate similarity between sentences. The similarity information was used for either selecting the representative sentence among similar sentences or gathering key senten...

متن کامل

CRL/NYU System at DUC-2004

We participated in two multi-document summarization tasks (Task 2 and Task 5) at the DUC-2004 formal run and evaluated the performance of our summarization system. Our system based on sentence extraction also uses a module to estimate similarity between sentences. The similarity information was used for either selecting the representative sentence among similar sentences or gathering key senten...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2002